Unsupervised Discovery of Compound Entities for Relationship Extraction
نویسندگان
چکیده
In this paper we investigate unsupervised population of a biomedical ontology via information extraction from biomedical literature. Relationships in text seldom connect simple entities. We therefore focus on identifying compound entities rather than mentions of simple entities. We present a method based on rules over grammatical dependency structures for unsupervised segmentation of sentences into compound entities and relationships. We complement the rule-based approach with a statistical component that prunes structures with low information content, thereby reducing false positives in the prediction of compound entities, their constituents and relationships. The extraction is manually evaluated with respect to the UMLS Semantic Network by analyzing the conformance of the extracted triples with the corresponding UMLS relationship type definitions.
منابع مشابه
Structural Linguistics and Unsupervised Information Extraction
A precondition for extracting information from large text corpora is discovering the information structures underlying the text. Progress in this direction is being made in the form of unsupervised information extraction (IE). We describe recent work in unsupervised relation extraction and compare its goals to those of grammar discovery for science sublanguages. We consider what this work on gr...
متن کاملUnsupervised Relation Extraction with General Domain Knowledge
In this paper we present an unsupervised approach to relational information extraction. Our model partitions tuples representing an observed syntactic relationship between two named entities (e.g., “X was born in Y” and “X is from Y”) into clusters corresponding to underlying semantic relation types (e.g., BornIn, Located). Our approach incorporates general domain knowledge which we encode as F...
متن کاملKnowledge Discovery with CRF-Based Clustering of Named Entities without a Priori Classes
Knowledge discovery aims at bringing out coherent groups of entities. It is usually based on clustering which necessitates defining a notion of similarity between the relevant entities. In this paper, we propose to divert a supervised machine learning technique (namely Conditional Random Fields, widely used for supervised labeling tasks) in order to calculate, indirectly and without supervision...
متن کاملUnsupervised CRF for knowledge discovery (Découverte de connaissances dans les séquences par CRF non-supervisés) [in French]
Unsupervised CRF for knowledge discovery Knowledge discovery aims at bringing out coherent groups of entities. They are usually based on clustering ; the challenge is then to define a notion of similarity between the relevant entities. In this paper, we propose to divert Conditional Random Fields (CRF), which have shown their interest in supervised labeling tasks, in order tocalculate indirectl...
متن کاملSeeded Discovery of Base Relations in Large Corpora
Relationship discovery is the task of identifying salient relationships between named entities in text. We propose novel approaches for two sub-tasks of the problem: identifying the entities of interest, and partitioning and describing the relations based on their semantics. In particular, we show that term frequency patterns can be used effectively instead of supervised NER, and that the pmedi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008